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1. Introduction 

The classical random matrix theory is concerned with asymptotic of various 
spectral characteristics of families of random matrices, when the dimensions of 
the matrices tend to infinity There are many examples when these character- 
istics, which are random variables themselves, converge to certain limit laws. 
This includes the celebrated Wigner semicircle law for the empirical measures 
of eigenvalues of random symmetric matrices, Marchenko-Pastur law, which is 
the limit of empirical measures of sample covariance matrices, Tracy-Widom 
distribution describing the limit of the first singular values of a sequence of 
random matrices, etc. [TJ. These limits are of paramount importance, yet in 
applications one usually needs information about the behavior of such charac- 
teristics for large, but fixed n. For instance in problems in convex geometry 
one constructs a random section of an iV-dimensional convex body by taking 
the kernel or the range of a certain random matrix. Random matrices arise 
also in analysis of rates of convergence of computer science algorithms. In 
both cases, the dimension of the ambient space remains fixed, and one seeks 
explicit estimates of probabilities in terms of the dimension. For such problems 
knowing the limit behavior is of little help. 

The problems involving estimates for a fixed finite dimension arise in the 
classical random matrix theory as well. One of the main approaches in deriving 
the limit laws is based on analysis of the Stieltjes transform of measures pp. 
To derive the convergence of Stieltjes transforms, one frequently has to provide 
explicit bounds on the smallest singular value of a random matrix of a fixed 
size, which holds with high probability. This need arises, e.g., in derivation of 
the circular law and the single ring theorem. 

These questions led to development of non-asymptotic theory of random 
matrices, which provides probabilistic bounds for eigenvalues, singular values, 
etc. for random matrices of a large fixed size. The situation is roughly parallel 
to that arising for the sums of i.i.d. random variables, where the asymptotic 
and non-asymptotic results go hand in hand. The asymptotic behavior of the 
averages of n i.i.d. random variables is governed by the Strong Law of Large 
Numbers establishing the almost sure convergence to the expectation. Yet, to 
assert that the average of a large number of random variables is close to the 
expectation, we need a non-asymptotic version, e.g. Hoeffding inequality. This 
inequality yields a subgaussian bound for the large deviations (see the details 
below). Such behavior suggests that the limit distribution of the deviation 
should be normal, which leads to an asymptotic result, the Central Limit 
Theorem (CLT). To use the CLT in evaluation of probabilities for random 
sums, we need its non-asymptotic version, namely the Berry-Esseen Theorem. 
This theorem provides in turn a crucial step in deriving another fundamental 
asymptotic result, the Law of Iterated Logarithm. 
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These notes discuss the methods of the non-asymptotic approach to the 
random matrix theory. We do not attempt to provide an exhaustive list of 
references (a reader can check the surveys [5], [27] . and [Hi]). Instead we con- 
centrate on three essentially different examples, with the aim of presenting 
the methods and results in a maximally self-contained form. This approach 
inevitably leaves out several important recent developments, such as invertibil- 
ity of random symmetric matrices [19] , applications to the Circular Law 
[HJ EH EH]) and concentration for random determinants ED]- Yet, by re- 
stricting ourselves to a few results, we will be able to give a relatively complete 
picture of the ideas and methods involved in their proofs. We start with in- 
troduction to subgaussian random variables in Section [3j In Sections [5]{7] we 
obtain quantitative bounds for invertibility of random matrices with i.i.d. en- 
tries. As will be shown in Section [6j the arithmetic structures play a crucial 
role here. Section [8] studies a question arising in geometric functional analysis. 
Here the ambient space is Banach, and the approach combines the methods 
of the previous sections with the functional-analytic considerations. We will 
also touch upon majorising measures, which are a powerful tool for estimating 
suprema of random processes. Section [9] contains another quantitative invert- 
ibility result. Here we discuss a random unitary or orthogonal perturbation 
of a fixed matrix. Unlike in the first example, the arithmetic structure plays 
no role in this problem. The main difficulty is the dependence between the 
entries of a random matrix, and the method is based on the introduction of 
perturbations with independent entries. 
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gin, and Artem Zvavitch for their hospitality. The author also grateful to 
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2. NOTATION AND BASIC DEFINITIONS 

We shall consider random matrices of high order with independent entries. 
For simplicity, we shall assume that the entries are centered (Ea^ = 0) and 
identically distributed (both conditions may be relaxed). 
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Throughout these notes ||-|| denotes the £ p norm 

i/p 




and -Bp stands for the unit ball of this norm. The norm of an operator or a 
matrix will be denoted by ||-||. We use S n ~ l for the unit Euclidean sphere. If 
F is a finite set, then \F\ denotes the cardinality of F. Letters C,C',c etc. 
denote absolute constants. 

If N > n then an N x n matrix A can be viewed as a mapping of W 1 into 
M. N . Thus, a random matrix defines a random n-dimensional section of M. N . 
For geometric applications we need to know that this matrix would not distort 
the metric too much. Let us formulate it more precisely: 

Definition 2.1. Let N > n and let A be an N x n matrix. The condition 
number of the matrix A is 

a(A) 11 " J 



muLj.gSTi-i || Ax || 2 

If min^n-i || Ac|| 2 = 0, we set <r(A) = oo. 

The condition number of a matrix can be rewritten in terms of its singular 
values. 

Definition 2.2. Let N > n and let A be an N x n matrix. The singular 
values of A are the eigenvalues of (A* A) 1 / 2 , arranged in the decreasing order: 
s 1 (A)>s 2 {A)>...>s n (A). 

The singular values of A are the lengths of the semi-axes of the ellipsoid 
AB% ■ The first and the last singular values have a clear functional-analytic 
meaning: 

Si{A) = \\A : E n ->■ R^ll , 



and 



s J A) = min \\Ax\\ = 1/ : AW 1 



whenever A has the full rank. In this notation cr(A) = Si(A)/s n (A). 

Therefore, to bound the condition number, we have to estimate the first 
singular value from above, and the last one from below. For matrices with 
i.i.d. random entries the first singular value is the most robust. It can be 



estimated using a simple e-net argument, as will be shown in Proposition AA 
The last singular value presents a bigger challenge. We will obtain its bounds 
for "tall" rectangular matrices in Section |4| and for square matrices in Sections 

EE 
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3. SUBGAUSSIAN RANDOM VARIABLES 

In this section we introduce an important class of random variables with 
strong tail decay properties. This class contains the normal variables, as well 
as all bounded random variables. 

Definition 3.1. Let £ be a random variable and let v > 0. We shall call £ 

f-subgaussian if there exist constants C and v such that for any t > 

P(|f| >t)< Ce- Vt \ 
A random variable £ is called centered if E£ = 0. 

If the parameter v is an absolute constant, we call a f-subgaussian random 
variable subgaussian. We shall assume that the random variable £ is non- 
degenerate, i.e. Var(£) > 0. 

The subgaussian condition can be formulated in a number of different ways. 

Theorem 3.2. Let X be a random variable. The following conditions are 
equivalent: 

(1) X is subgaussian; 

(2) 3a > Ee aX2 < +oo (^-condition); 

(3) 3B, b > VA G R Ee xx < Be x2b (Laplace transform condition); 

(4) 3K > Vp > 1 (E|X| p ) 1/p < Ky/p (moment condition). 
Moreover, if X is a centered random variable, (3) can be rewritten as 

(3)' 36' > VA G R Ee xx < e xH ' . 

Proof. The proof is a series of elementary calculations. 
(1) =^> (2) Let a < v. By the integral distribution formula, 



Ee 



poo poo 

= 1 + / 2ate at2 ■ P(|X| >t)dt<l+ 2at- Ce'^~ a)t2 dt < +oo. 
Jo Jo 

(2) =^ (3) Let A be any real number. Then 

Ee xx = Ee xx - aX2 e aX2 < su V e xt - at ' 2 ■ Ee aX2 < Be x2 ' 4a . 

teM. 

(3) =^ (4) Set A = y/p. Replacing, as before, the the function by its supremum, 
we get 

E|X| P < supt p e-^* • Ee^ |x| < ( ^ ) • Ce pb . 
t>o V e / 

(4) =^ (1) Assume first t > eK. Choose p so that = e _1 . 

P (W>t) <H<(^)'. t .. s - 

where v = e~ 2 K~ 2 . This proves (1) for t > eK. Setting C = e automatically 
guaranties that (1) holds for < t < eK as well. 
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(3)' We will assume that (3) holds with B > 1 since otherwise the statement is 
trivial. Assume first that X is symmetric. For large values of A, we can derive 
(3) with constant B — 1 by changing the parameter b. Indeed, set Ao = %/2a 
and choose b > so that Be x ° b < e x ° b . This guarantees that (3) holds for all 
A such that |A| > A with B = 1 and b replaced by b. 

If A 2 < 2a, then by the ^-condition and Holder's inequality, 



Ee xx = E 

2 



\{e xx + e- xx ) < Ee A2 * 2 / 2 < (Ee^ 2 )^ < exp fc^] . 



Finally, we set b' = max(c/2a,6). 

If X is merely centered, we use a simple symmetrization. Let X' be an 
independent copy of X. Then by Jensen's inequality, 

Ee xx = Ee^ x - EX '^ <Ee x( - x - x '\ 

where X — X' is a symmetric subgaussian random variable. □ 

Remark. The -^-condition turns the set of centered subgaussian random vari- 
ables into a normed space. Define the function tp2 '■ K — > R by i^zif) = 
exp(t 2 ) — 1. Then for a non-zero random variable set 

||X||^ 2 = inf{ S > | EMX/s) < 1}. 

The subgaussian random variables equipped with this norm form an Orlicz 
space (see [XT] for the details). 

To estimate the first singular value, we have to prove a large deviation in- 
equality for a linear combination of independent subgaussian random variables. 
Note that a linear combination of independent Gaussian random variables is 
Gaussian. We prove below that a linear combination of independent subgaus- 
sian random variables is subgaussian. 

Theorem 3.3. Let Xi, . . . ,X n be independent centered subgaussian random 
variables. Then for any ai, . . . , a n £ R 



P 



a i X > 



3=1 



> t < 2 exp 



ct 2 



En o 



( n \ 1//2 

Proof. Set Vj = aj/ ($^=i a ]j ■ We have to show that the random variable 
Y = Y^l=i v jXj is subgaussian. Let us check the Laplace transform condition 
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(3)'. For any A G 



Eexp | A VjXj J = Y\ Eexp(Au,X,- 



3=1 



3=1 



< JJexp(A 2 t;|6) = exp A 2 fe^ 



3=1 



3=1 



The inequality here follows from (3)'. Note that the fact that the constant in 
front of the exponent in (3)' is 1 plays the crucial role here. □ 



Theorem |3.3| can be used to give a very short proof of a classical inequality 
due to Khinchin. 

Theorem 3.4 (Khinchin). Let Xi, . . . ,X n be independent centered subgaus- 
sian random variables. For any p > 1 there exist A p , B p > such that the 
inequality 



ME 



1/2 



< E 



d=i 



a i x i 



3=1 



P\ 1/p 



< 



ME 



1/2 



d=i 



holds for all ax, ... , a n G 



Proof. Without loss of generality, assume that ( Y7j=i a 
Let p > 2. Then by Holder's inequality 

2\ 12 



1/2 



1/2 



E' 

J=l 



E 



3=1 



< 



3=1 



1. 



p\ 1/p 



so A p = 1. By Theorem 3.3, Y = Y^j=i a jXj is a subgaussian random variable. 
Hence, 

(E\Y\ p ) 1/p < C^p~=: B p . 

This is the right asymptotic as p — > oo. 

In the case 1 < p < 2 it is enough to prove the inequality for p — 1. As 
before, by Holder's inequality, we can choose B p — 1. Applying Khinchin's 
inequality with p = 3, we get 



E|Yf = E|Yf/ 2 ■ |F| 3 / 2 < (E|F|) 1/2 ■ (E|F| 3 ) i/2 < (E|F|) 1/2 ■ B\' 2 (E|E | 2 ) 



3\V2 



,1/2 R 3/2 



,2\3/4 



Hence, 



B 3 3 (E|F| 2 ) 1/2 < E|E|. 



□ 
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4. INVERTIBILITY OF A RECTANGULAR RANDOM MATRIX 

We introduce the e-net argument, which will enable us to bound the condi- 
tion number for a random N x n matrix with independent entries in the case 
when N ^ n. To simplify the proofs we assume from now on that the entries 
of the matrix are centered, subgaussian random variables. 

Recall the definition of an e- net. 

Definition 4.1. Let (T,d) be a metric space. Let K C T. A set Af C T is 
called an e-net for K if 

Vx G K3y G Af d(x, y) < e. 
A set S C K is called e-separated if 

Vx, y <E S d(x, y) > e. 

These two notions are closely related. Namely, we have the following ele- 
mentary Lemma. 

Lemma 4.2. Let K be a subset of a metric space (T, d), and let Af C T be an 
e-net for K . Then 

(1) there exists a 2e-net Af' C K such that \Af'\ < \Af\; 

(2) any 2e-separated set S C K satisfies \S\ < \Af\. 

(3) From the other side, any maximal e-separated set S' C K is an e-net 
for K. 

We leave the proof of this lemma for a reader as an exercise. 

Lemma 4.3 (Volumetric estimate). For any e < 1 there exists an e-net Af C 
S n - 1 such that 

'3 s " 



\Af\< 

Proof. Let Af be a maximal e-separated subset of S 71-1 . Then for any distinct 
points x, y G Af 

(x + E -I%) n(y+ £ -B£) = 0. 



Hence, 



\Af\ ■ vol (|S 2 ») = vol ( |J (x + |B 2 ») ) < vol ((1 + , 

Wat / 



which implies 



!)■*(!)■. 

Using e-nets, we prove a basic bound on the first singular value of a random 
subgaussian matrix: 
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Proposition 4.4 (First singular value). Let A be an N x n random matrix, 
N > n, whose entries are independent copies of a subgaussian random variable. 
Then 

P(si(A) > ty/N) < e- C0t2N for t > C . 

Proof. Let M be a (l/2)-net in S N ~ 1 and M be a (l/2)-net in S" 1 ' 1 . For any 
u G S n ~ l , we can choose a x G M such that \\x — u\\ 2 < 1/2. Then 

\\Au\\ 2 < \\Ax\\ 2 + \\A\\ ■ \\x - u\\ 2 < \\Ax\\ 2 + i \\A\\ . 

This shows that \\A\\ < 2swp xeJ ^\\Ax\\ 2 = 2swp xeJ ^swp veS N-i(Ax,u). Ap- 
proximating v in a similar way by an element of Ai, we obtain 



I All < 4 max | (x, y)\. 

xeJV, y£M 



By Lemma |4.3[ we can choose these nets so that 



By Theorem 3.3, for every x G Af and y G Ai, the random variable {Ax, y) = 
Y,f=i J2k=i a i,kVjXk is subgaussian, i.e., 

P(\(Ax,y)\>tVN) <C ie - Clt2N fort>0. 
Taking the union bound, we get 

P(||A|| >tVN) < \Af\\M\ msx M ¥(\(Ax,y)\ >t\fN/A) 

< Q N ■ Q N ■ C ie - C ^ N < L7 ie - Coi27V , 

provided that t > Co for an appropriately chosen constant Co > 0. This 
completes the proof. □ 



Proposition 



4.4 



means that for any N > n the first singular value is 0(yN) 
with probability close to 1. Thus, the bound for the condition number reduces 
to a lower estimate of the last singular value. 

To obtain it, we prove an easy estimate for a small ball probability of a sum 
of independent random variables. 

Lemma 4.5. Let be independent copies of a centered subgaussian 

random variable with variance 1. Then there exists // G (0, 1) such that for ev- 
ery coefficient vector a = (ax, ... , a n ) G S^ 1 the random sum S = Y^k=i ak ^ k 
satisfies 

n\s\ < i/2) < /i. 

Proof. Let < A < (E5 2 ) 1 / 2 = 1. By the Cauchy-Schwarz inequality, 
ES 2 = ES 2 1 [X , X] (S) + E5 2 l RMAiA] (5) < A 2 + (ES 4 ) 1/2 F(\S\ > X) 1 ' 2 . 
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This leads to the Paley-Zygmund inequality: 

(ES 2 - A 2 ) 2 (1 - A 2 ) 2 



\S\ > A) > 



ES A ~ ES 4 



By Theorem 3.3, the random variable S is subgaussian, so by (4), Theorem 3.2 
ES" 4 < C. To finish the proof, set A = 1/2. □ 



Lemma |4.5| implies the following invertibility estimate for a fixed vector. 

Corollary 4.6. Let A be a matrix as in Proposition \4-4\ Then there exist 
constants rj, v G (0, 1) such that for every x G S 1 ™" 1 , 

P(||Ac|| 2 < rfN 1 ' 2 ) < v N . 

Proof. The coordinates of the vector Ax are independent linear combinations 
of i.i.d. subgaussian random variables with coefficients (x±, . . . ,x n ) G 5* n_1 . 
Hence, by Lemma 145} P(|(Ar) 3 -| < 1/2) < /i for all j — 1, . . . , N. 

Assume that ||Ar|| 2 < r]y/N. Then \(Ax)j\ < 1/2 for at least (1 - Arf)N > 
N/2 coordinates. If rj is small enough, then the number M of subsets of 
{1,. . . , N} with at least (1 — Arf)N elements is less than jj,~ N / A . Then the 
union bound implies 

\Ax\\ 2 < V N l l 2 )<M-^l 2 <^ N ^. □ 



Combining this with the e-net argument, we obtain the estimate for the 
smallest singular value of a random matrix, whose dimensions are significantly 
different. 

Proposition 4.7 (Smallest singular value of rectangular matrices). Let A be 

an Nxn matrix whose entries are i.i.d. centered subgaussian random variables 
with variance 1. There exist Ci,c 2 > and 5 G (0,1) such that if n < 6 N, 
then 



(4.1) P( min \\Ax\\ 2 < Cl N 1/2 ) < e~ C2N . 

Proof. Let e > to be chosen later. Let M be an e-net in S' n_1 of cardinality 
|-A/] — (3/^) n - Let r) and v be the numbers in Corollary 4.6 Then by the 
union bound, 

(4.2) P (3y G M : \\Ay\\ 2 < V N 1/2 ) < (3/e) n ■ u N . 

Let V be the event that \\A\\ < C iV 1/2 and \\Ay\\ 2 > ^N 1 / 2 for all points 
yeAf. 

Assume that V occurs, and let x G 5 rt_1 be any point. Choose y G H such 
that \\y — x\\ 2 < e. Then 

||Ar|| a > ||^|| 2 - \\A\\ -\\x-y\\ 2 > V N^ 2 - C N 1 ' 2 • e = 
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if we set e = r]/ (2Cq). By (4.2) and Proposition 4.4 



F(V C ) < (u ■ (3/e) n/N ) N + e~ c ' N < e~ C2N , 

if we assume that n/N < Sq for an appropriately chosen 5q < 1. This completes 
the proof. □ 



5. INVERTIBILITY OF A SQUARE MATRIX: 
ABSOLUTELY CONTINUOUS ENTRIES 

Until recently, much less has been known about the behavior of the smallest 
singular value of a square matrix. In the classic work on numerical inversion 
of large matrices, von Neumann and his associates used random matrices to 
test their algorithms, and they speculated that 

(5.1) s n (A) ~ n -1 / 2 with high probability 

(see [S], pp. 14, 477, 555). In a more precise form, this estimate was conjec- 
tured by Smale |29j and proved by Edelman [6] and Szarek [31] for random 
Gaussian matrices A, i.e., those with i.i.d. standard normal entries. Edelman's 
theorem states that for every e G (0, 1), 

(5.2) P(s n (A) < en- 1 ' 2 ) ~ e. 



Conjecture (5.1) for general random matrices was an open problem, unknown 
even for the random sign matrices A, i.e., those whose entries are ±1 symmetric 
random variables. The first polynomial bound for the smallest singular value 
of a random matrix with i.i.d. subgaussian, in particular, ±1 entries was 
obtained in |23j. It was proved that for such matrix s n (A) > Cn~ 3 ' 2 with 
high probability. Following that, Tao and Vu proved that if A is a ±1 random 
matrix, then for any a > there exists /3 > such that s n (A) > with 



probability at least 1 — n a . In [24J the conjecture (5.1) is proved in full 
generality under the fourth moment assumption. 

Theorem 5.1 (Invertibility: fourth moment). Let A be an n x n matrix whose 
entries are independent centered real random variables with variances at least 
1 and fourth moments bounded by B. Then, for every 5 > there exist e > 
and no which depend (polynomially) only on 5 and B, such that 

^{s n {A) < en~ 1/2 ) < 5 for all n > n Q . 

This shows in particular that the median of s n (A) is at least of order n^ 1 ' 2 . 
To show that s n (A) ~ n~ x l 2 with high probability, one has to prove a matching 
lower bound. This was done in |26j for matrices with subgaussian entries and 
extended in [41 J to matrices, whose entries have the finite fourth moment. 
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Under stronger moment assumptions, more is known about the distribution 
of the largest singular value, and similarly one hopes to know more about the 
smallest singular value. 



One might then expect that the estimate (5.2) for the distribution of the 



smallest singular value of Gaussian matrices should hold for all subgaussian 



matrices. Note however that (5.2 ) fails for the random sign matrices, since they 
are singular with positive probability. Estimating the singularity probability 
for random sign matrices is a longstanding open problem. Even proving that 
it converges to as n — > oo is a nontrivial result due to Komlos |16| . Later 
Kahn, Komlos and Szemeredi [15] showed that it is exponentially small: 

(5.3) P(random sign matrix A is singular) < c n 

for some universal constant c G (0, 1). The often conjectured optimal value of 
c is 1/2 + o(l) [15], and the best known value l/y/2 + o(l) is due to Bourgain, 
Vu, and Wood [I], (see [331 [35] f° r earlier results). 



Spielman and Teng [30] conjectured that (5.2) should hold for the random 
sign matrices up to an exponentially small term that accounts for their singu- 
larity probability: 

W(s n (A) ^en- 1 ' 2 ) <e + c n . 

We prove Spielman- Teng's conjecture up to a coefficient in front of e. More- 
over, we show that this type of behavior is common for all matrices with 
subgaussian i.i.d. entries. For a bound for random matrices with general i.i.d. 
entries see 



Theorem 5.2 (Invertibility: subgaussian). Let A be an n x n matrix whose 
entries are independent copies of a centered subgaussian real random variable. 
Then for every e > 0, one has 

(5.4) P(s«(A) < en- 1/2 ) < Ce + c n , 

where C > and c G (0, 1). 

Note that setting e = we recover the result of Kahn, Komlos and Sze- 



meredi. Also, note that the question whether (5.4) holds for random sign 
matrices with coefficient C = 1 remains open. 

We shall start with an attempt to apply the e-net argument. Let us consider 
an n x n Gaussian matrix, i.e., a matrix with independent N(0, 1) entries. In 
this case, for any x G 5 n ~ 1 , the vector Ax has independent iV(0, 1) coordinates, 
so it is distributed like the standard Gaussian vector in M. n . Hence, for any 
t > 0, 



Ac|| 2 < t^i) = (2n)- n/2 [ e HMl2/2 dx < (2 7 r)- n/2 vol(t v / ^ ■ B$) 
< (Cit) n . 
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Fix e > 0. Let M be an e-net in S n ~ l of cardinality \N\ < (3/e) n . Then by 
the union bound, 

P (3x G J\f : \\Ax\\ 2 < tn 1/2 ) < (3/e) n ■ {C x t) n . 

To obtain a meaningful estimate we have to require 

(5.5) (3/e) ■ (Cit) < 1. 



As in Proposition 4.7, we may assume that \\A\\ < Co\^n, since the complement 
of this event has an exponentially small probability. Assume that for any 
y G Af, \\Ay\\ 2 > ty/n. Given x G S' n ~ 1 , find y G H satisfying \\x — y\\ 2 < e. 
Then 

||Ap|| 2 > \\Ay\\ 2 - \\A\\ ■ \\x - y\\ 2 > tn 1/2 - C n 1/2 ■ e. 
To obtain a non-trivial lower bound, we have to assume that 
(5.6) t > C E. 



Unfortunately, the system of inequalities (5.5) and (5.6) turns out to be in- 
consistent, and the e-net argument fails for the square matrix. Nevertheless, 
a part of this idea can be salvaged. Namely, if the cardinality of the e-net 
satisfies a better estimate 

(5.7) M < (a/eT 



for a small constant a > 0, then (5.5) is replaced by (a/e) ■ (Cit) < 1, and the 
system (5.5), (5.6) becomes consistent. While the estimate ( 5.7[ ) is impossible 



for the whole sphere, it can be obtained for a small part of it. This becomes 
the first ingredient of our strategy: small parts of the sphere will be handled 
by the e-net argument. However, the "bulk" of the sphere has to be handled 
differently. 

The proof of Theorem |5.2| for random matrices with i.i.d. subgaussian entries 
having a bounded density is presented below. 

5.1. Conditional argument. To handle the "bulk", we have to produce an 
estimate which holds for all vectors in it simultaneously, without taking the 
union bound. Let x G S n ~ l be a vector such that \x\\ > rT x l 2 . Denote the 
columns of the matrix A by X\, . . . , X n , and let 

Hj := span(X fc | k ^ j). 

Then Ax = YJk=i x k x k, so 

(5.8) || Ar|| 2 > dist(Ar, H x ) = dist(xiX 1; H x ) > n^dist^, H x ). 

Note that the right hand side is independent of x. Therefore it provides a 
uniform lower bound for all x such that |xi| > rT 1 ! 2 . Since any vector x G 
S n ~ x has a coordinate with absolute value greater than n -1 / 2 , we can try to 
extend this bound to the whole sphere. This approach immediately runs into 
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a problem: we don't know a priori which of the coordinates of x is big. To 
modify this approach we shall pick a random coordinate. To this end we have 
to know that the random coordinate is big with relatively high probability. 
This is true for vectors, which look like the vertices of a discrete cube, but is 
obviously false for vectors with small support, i.e. a small number of non-zero 
coordinates. This observation leads us to the first decomposition of the sphere: 

Definition 5.3 (Compressible and incompressible vectors). Fix 5, p G (0,1). 
A vector i 6 R n is called sparse if |supp(x)| < Sn. A vector x G S 11 ^ 1 is called 
compressible if x is within Euclidean distance p from the set of all sparse 
vectors. A vector x G 5' n ~ 1 is called incompressible if it is not compressible. 
The sets of sparse, compressible and incompressible vectors will be denoted by 
Sparse, Comp and Incomp respectively. 

Using the decomposition of the sphere S n ~ l = Comp U Incomp, we break 
the invertibility problem into two subproblems, for compressible and incom- 
pressible vectors: 

(5.9) F(s n (A) < en~ 1/2 ) 

<P( inf \\Ax\\ 2 < en~ 1/2 ) 

xdComp 

+ P( inf \\Ax\\ 2 < sn~ 1/2 ). 

xd Incomp 

On the set of compressible vectors, we obtain an inequality, which is much 
stronger than we need. 

Lemma 5.4 (Invertibility for compressible vectors). Let A be a random matrix 



as in Theorem 5.2, Then there exist 5,p,Ci,C2 > such that 

P( inf \\Ax\\ 2 < Cl n 1/2 ) < e^ 2 " '. 

x£ Comp 

Sketch of the proof. Any compressible vectors is close to a coordinate subspace 
of a small dimension Sn. The restriction of our random matrix A onto such 
a subspace is a random rectangular n x Sn matrix. Such matrices are well 
invertible outside of an event of exponentially small probability, provided that 



5 is small enough (see Proposition 4.7). By taking the union bound over all 



coordinate subspaces, we deduce the invertibility of the random matrix on the 
set of compressible vectors. □ 



We shall fix S and p as in Lemma |5.4| for the rest of the proof. 

The incompressible vectors are well spread in the sense that they have many 
coordinates of the order n -1 / 2 . This observation will allow us to realize the 
scheme described at the beginning of this section. 
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Lemma 5.5 (Incompressible vectors are spread). Let x G Incomp. Then there 
exists a set a(x) C {1, . . . ,n} of cardinality \cr(x)\ > V\n and such that 

= < \%k\ < — p= /or all k G er. 
n \ n 



Here V\ , z/ 2 < 1 and z/ 3 > 1 are constants depending only on the parameters 
5, p. 

We leave the proof of this lemma to the reader. 



The main difficulty in implementing the distance bound like (5.8) is to avoid 
taking the union bound. We achieve this in the proof of the next lemma by a 
random choice of a coordinate. 

Lemma 5.6 (Invertibility via distance). Let A be a random matrix with i.i.d. 
entries. Let Xi, . . . , X n denote the column vectors of A, and let H k denote the 
span of all column vectors except the k-th. Then for every e > 0, one has 

(5.10) P( inf \\Ax\\ 2 < eu 2 n- l/2 ) < — ■ P(dist(X n , H n ) < e). 

x£lncomp l>\ 

Proof. Denote 

p := P(dist(X fc , H k ) <e). 

Note that since the entries of the matrix A are i.i.d., this probability does not 
depend on k. Then 

¥\{k : dist(X fc , H k ) < e}\ = np. 

Denote by U the event that the set <j\ := {k : dist(Xfc, H k ) > e} contains 
more than (1 — V\)n elements. Then by Chebychev's inequality, 

F(U C ) < —. 

Assume that the event U occurs. Fix any incompressible vector x and let a(x) 
be the set from Lemma 5.5 Then |o"i| + |cr(a;)| > (1 — v\)n + v\n = n, so the 
sets 0i and o~(x) have nonempty intersection. Let k G <J\ fl cr(x), so 

\xk\ > V2n~ 1 l 2 and dist(Xk, H^) > e. 

Writing Ax = Y^j=i x jXj, we get 

\\Ax\\ 2 > dist(Ax,H k ) = dist(x k X k , H k ) = \x k \dist(X k , H k ) 

> u 2 n~ 1/2 ■ e. 

Summarizing, we have shown that 



- 1 / 2 ) < F(U C ) < 

This completes the proof. □ 



P( inf \Ax\i < ev 2 n~ l/2 ) < f(U c ) < 

i£ Incomp V\ 
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Lemma 5J3 reduces the invertibility problem to a lower bound on the distance 
between a random vector and a random subspace. Now we reduce bounding 
the distance to a small ball probability estimate. 

Let Xi, . . . , X n be the column vectors of A. Let X* be any unit vector 
orthogonal to X±, . . . ,X n -\. We call it a random normal. We can choose 
X* so that it is a random vector that depends only on X\, . . . ,X n _i and is 
independent of X n . 

We clearly have 

(5.11) dist(X n ,# n ) > \(X*,X n }\. 

The vectors X* =: (ai, . . . , a n ) and X n =: (£1, . . . , £ n ) are independent. Condi- 
tion on the vectors X 1; . . . , X n _i. Then the vector X* can be viewed as fixed, 
and the problem reduces to the small ball probability estimate for a linear 
combination of independent random variables 

n 

(X*,X n ) = afc6c- 

k=l 

Assume for a moment that the distribution of a random variable £ is abso- 
lutely continuous with bounded density. Then 

(5.12) P(|f | < t) < C't for any t > 0. 

This estimate can be extended to a linear combination of independent copies 
of £. Therefore, 

F(\(X*,X n )\<t\X*)<Ct. 
Integrating over X 1? . . . , X n _i, we obtain 

W(\(X*,X n )\<t)<Ct. 
Thus, combining this estimate with Lemma [5.6[ we prove that 
P( inf ||Ar|| a < eu 2 n' l/2 ) < Ce. 

xElncomp 



Then (5.9) and Lemma 5.4 imply Theorem 5.2 in this case. 



6. Arithmetic structure and the small ball probability 



To prove Theorem 5.2 in the previous section, we used the small ball prob- 
ability estimate (5.12). However, this estimate does not hold for a general 
subgaussian random variable, and in particular for any random variable hav- 
ing an atom at 0. 

Despite this, a linear combination J^fc=i a fc6c of independent copies of a 



subgaussian random variable £ obeys an estimate similar to (5.12) for a typical 



vector a = (a 1; . . . ,a n ). It is easy to see that such estimate is impossible for 
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all vectors a G S n 1 . Indeed, assume that £ is the random ±1 variable. Then 
for 



0<" 



(^,^,o,...,o), p(t«*& = o) = i- 



This singular behavior is due to the fact that the vector is sparse. If we 
choose the vector a, which is far from the sparse ones, i.e. an incompressible 
vector, the small ball probability may be significantly improved. Consider for 
example, the vector 

a< 2 >=(— 

Then by the Berry-Esseen Theorem, 



P 



<t\ <C\t 



1 

k=i v 

This estimate cannot be improved, since for an even n 



^k=l 



The coordinates of the vector are the same, which results in a lot of 
cancelations in the random sum Ylk=i a ^k- If the arithmetic structure of 
the coordinates of the vector a is less rigid, the small ball probability can be 
improved even further. For example, for the (not normalized) vector 



i(3) = + l + 2£ p (j, = Q ^ n _ 3/2 

Determining the influence of the arithmetic structure of the coordinates of 
a vector a on the small ball probability for the random sum Y^k=i flfc £ fc became 
known as the Littlewood-Offord Problem. It was investigated by Littlewood 
and Offord, Erdos, Sarcozy and Szemeredi, etc. Recently Tao and Vu [36] put 
forward the inverse Littlewood-Offord theorems, stating that the large value 
of the small ball probability implies a rigid arithmetic structure. The inverse 
Littlewood-Offord theorems are extensively discussed in [31], see also [21] for 
current results in this direction. We will need a result of this type for the 



conditional argument to compensate for the lack of the bound (5.12). 

The additive structure of a sequence a = (ai, . . . , a n ) of real numbers can 
be described in terms of the shortest arithmetic progression into which it em- 
beds. This length is conveniently expressed as the least common denominator 
of a, defined as follows: 



lcd(a) := inf jfl > : 9a G Z n \ {0}}. 
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For the vector aS 2 \ 



\k=l 

A similar phenomenon occurs for the vector a^: 



lcd(a( 3 )) = n 3 / 2 ~l/p 



Icd(a( 2 )) = v ^~l/P lj2 a ^ 



fc=i 












This suggests that the least common denominator of the sequence controls the 
small ball probability. However, in the case when t > 0, or when the random 
variable £ is not purely discrete, the precise inclusion 9 a G Z n \ {0} loses its 
meaning. It should be relaxed to measure the closeness of the vector 9a to the 
integer lattice. This leads us to the definition of the essential least common 
denominator. 

Fix a parameter 7 e (0, 1). For a > define 



The requirement that the distance is smaller than 7||6 l a||2 forces us to consider 
only non-trivial integer points as approximations of 9a - only those in a small 
aperture cone around the direction of a (see the picture below) . 



One typically uses this definition with 7 a small constant, and for a = Cy/n 
with a small constant c > 0. The inequality dist(6 l a, Z n ) < a then yields that 
most coordinates of 9a are within a small constant distance from integers. This 
choice would allow us to conclude that the least common denominator of any 
incompressible vector is of order at least *Jn. Let us formulate this statement 
precisely. 
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Lemma 6.1. There exist constants 7 > and A > depending only on the 
compressibility parameters 5, p such that any incompressible vector a satisfies 
LCD a (a) > Xy/n. 

Proof. Assume th at a is an incompressible vector, and let a (a) be the set 
defined in Lemma 5.5 If LCD a (a) < Xy/n, then 

\\9a - z\\ 2 < 70 < "fXy/n for some G (0, A-y/n), z G Z n . 

Let 1(a) be the set of all j G {1, . . . , n} such that 

2 7 A 



\9dj — Zj\ < 



The previous inequality implies that \I{a)\ > (1 — v\/2)n. Therefore, for the 
set J (a) = 1(a) D a (a), we have 



\J(a)\ > ^n. 



For any j G J (a), we have 

\zj\ < 9\aA + 



2 7 A 



< XVn 



2 7 A 



<1, 



provided that A is chosen so that A \ + < 1. Since z G Z, this means 
that Zj = 0. Finally, this implies 

1/2 



\9a — z\\ 2 > 




for 7 < viyvxjl. This contradicts the assumption that LCD Q (a) < Xy/ri. □ 



We fix 7 satisfying Lemma 6T for the rest of the proof. 

The following theorem gives a bound on the small ball probability for a 
random sum in terms of the additive structure of a. The less structure a has, 
the bigger its least common denominator is, and the smaller the small ball 
probability is. 

Theorem 6.2 (Small ball probability). Let £1, . . . , £ n be independent copies of 
a centered subgaussian random variable £ of unit variance. Consider a sequence 
a = (fli, . . . , a n ) G S n ~ x . Then, for every a > 0, and for 

(4/tt) 



e > 



LCDja)' 



we have 



<e \ <Ce + Ce' 



k=l 
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We shall prove more than is claimed in the Theorem. Instead of the small 
ball probability we shall bound a parameter, which controls the concentration 
of a random variable around any fixed point. 

Definition 6.3. The Levy concentration function of a random variable S is 
defined for e > as 

C(S,e) = sup¥(\S-v\ < e). 

The proof of the Theorem uses the Fourier-analytic approach developed by 
Halasz [H], [T3]. 

We start with the classical Lemma of Esseen, which estimate the Levy con- 
centration function in terms of the characteristic function of a random variable. 

Lemma 6.4. Let Y be a real-valued random variable. Then 
supF(\Y-v\ < 1) < C [ \4> Y (0)\d6, 

veM, J-2 

where 4>y{9) — Eexp(i6Y) is the characteristic function ofY. 
Proof. Let ip = X[-i,i] * X[-i,i] and let / = 

f(t) = 



2 sint x 2 



t 

Then both / £ L±(M.) and ip G Li(R), so / satisfies the Fourier inversion 
formula. Note also, that f(t) > c whenever \t\ < 1. Therefore, 

¥{\X - v\ < 1) = Ex[_i,i](X -v)< -Ef(X - v) 

c 

-E[ — / m)^ x - v) de \ < — [ i)(9)\Ee ie ^-^\d9 
c \2n J R J 2vrc J R 



c 

r-2 



< — I \Ee i6X \d8. 



nc 



2 



The last inequality follows from supp(V') = [—2, 2] and i/j(x) < 2. □ 



Proof of Theorem 6.2, To make the proof more transparent, we shall assume 
that £ is the random ±1 variable. The general case is considered in [24J. 

Let S = YTj=i a j^3- Applying Esseen's Lemma to the random variable 
Y — S/e, we obtain 

(6.1) C(S,e)<C f \<p s {d/e)\dd = C f TT \4>j{9/e)\ d9, 

J-2 J- 2 j=l 

where 

4>j(t) = Eexp(zeij£jt) = cos(aji). 
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The last equality in (6.1) follows from the independence of £j, j = 1, . . . , n. 
The inequality \x\ < exp(— |(1 — x 2 )), which is valid for all i6R, implies 

\4>j{t)\ < exp (~ sin 2 ^)^ < exp (~ mm\^ aj t - q\ 2 

In the last inequality we estimated the absolute value of the sinus by a piecewise 
linear function, see the picture below. 




Combining the previous inequalities, we get 



(6.2) 



r 2 ( i n 

C(S,e)<C / exp 

J - 2 \ j=l 



mm 



7T 



do 



C j exp(-h 2 (9)/2)d9, 



where 



h(9) = min 



TVS 



■ 9a — p 



Since by the assumption, 4/(?r£) < LCD a (a), the definition of the least 
common denominator implies that for any 9 G [—2,2], 

29 

h{9) > min(7 — ||a|| 2 , a). 
Recall that ||a|| 2 = 1. Then the previous inequality implies 



exp(-/i 2 (#)/2) < exp 



27, 



1TE 



exp(-a 2 /2). 



Substituting this into (6.2) we complete the proof. 



□ 



To apply the previous result for random matrices we shall combine it with 
the following Tensorization Lemma. 

Lemma 6.5 (Tensorization). Let Ci, • • • , Cm be independent real random vari- 
ables, and let K,Eq > 0. Assume that for each k 

P(|Cfc| <e) <Ke for all e > e . 
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Then 



P ( Cfc < £ 2 ™) < (CKe) m for all e > e , 



k=l 

where C is an absolute constant. 



Proof. Let e > e . We have 

p(^C|<£ 2 m) =p(m--^C fc 2 >0) <Eexp (m--^ 2 ) 

fc=l " fe=l " k=l 

m 

(6.3) =e m n Eex P(-C'A 2 )- 



fe=i 

By Fubini's theorem, 



Eexp(-C, 2 A 2 ) = E / ds = / P(exp(-C fc 2 /^ 2 ) > *) ^ 

Jo Jo 



For u G (0,1), we have IP([Cfe| < eu) < P(|Cfe| < e) < #e. This and the 
assumption of the lemma yields 

/*1 /*oo 

Eexp(-Cfc/£ 2 ) < / 2ue~ u2 Ke du + 2ue- u2 Keu du < CKe. 



Putting this into ( |6.3[ ) yields 

m 

V(j2$ <£2m ) <e m (CKe) m . 

k=l 

This completes the proof. □ 



Combining Theorem |6.2| and Lemma |6.5| yields the multidimensional small 
ball probability estimate similar to the one we had for absolutely continuous 
random variable. 

Lemma 6.6 (Invertibility on a single vector). Let A' be an m x n random 
matrix, whose entries are independent copies of a centered subgaussian random 
variable £. Then for any a > 0, for every vector x G S 71 " 1 , and for every t > 0, 
satisfying 

t>maxf (4/?r) e-^ 
- maX UcD Q (x)' 6 

one has 

P(P'x|| <tn l/2 ) < (Ct) m . 
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7. Putting all ingredients together 
Now we have developed all necessary tools to prove the central result. 



Theorem. 5.2 (Invertibility: subgaussian) Let A be an n x n matrix whose 
entries are independent copies of a centered subgaussian real random variable. 
Then for every e > one has 

P{s n (A) < en- 1 ' 2 ) < Ce + c n , 

where C > and c G (0, 1). 

Recall that we have divided the unit sphere into compressible and incom- 
pressible vectors (see Definition 5.3 and inequality (5.9)), and proved that the 
first term in (5.9) is exponentially small. Applying Lemma 5.6 and (5.11), we 
reduced the estimate for the second term to the bound for 

p(e):=W(\(X*,X n )\<e), 

where X n is the n-th column of the matrix A, and X* is a unit vector orthogonal 
to the first n — 1 columns. To complete the proof, we have to show that 

(7.1) p(e) < Ce, 

whenever e > e~ cn . Here (X*,X n ) = £J , Xfe, where X* = (X*, . . . ,X*). 
Throughout the rest of the proof set 



(7.2) 



a 



where (3 > is a small absolute constant, which will be chosen at the end of the 
proof. If LCDq,(X*) > e cn , then (7.1) follows from Theorem 6.2 Therefore, 
our problem has been further reduced to 

Theorem 7.1 (Random normal). Let X\, . . . ,X n _i be random vectors whose 
coordinates are independent copies of a centered subgaussian random variable 
£. Consider a unit vector X* orthogonal to all these vectors. There exist 
constants c, d > such that 



P(LCD Q (X*) 



< e cn ) < e~ cn . 



Intuitively, the components of a random vector should be arithmetically 
incommensurate to the extent that their essential LCD is exponential in n. 
This is rather obvious for a random vector uniformly distributed over the 
sphere. However, the distribution of the random normal X* is more involved, 
and it requires some work to confirm this intuition. 

Proof. Let A 1 be the (n — 1) X n matrix with rows X'f , . . . , X^_ 1 . Then 
X* G Ker(A'). The matrix A' has i.i.d. entries. We start with using the 
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decomposition similar to (5.9): 

P(3X* G S n ~ 2 LCD Q (X*) < e cn and A'X* = 0) 
< P(3X* G Comp A'X* = 0) 
+ P(3X* G /ncomp LCD Q (X*) < e cn and A'X* = 0). 



Lemma |5.4| implies that the first term in the right hand side does not exceed 
e~ cn . Formally, we have to reprove this lemma for (n — 1) x n matrices, instead 
of the n x n ones, but the proof extends to this case without any changes. 

To bound the second term, we introduce a new decomposition of the sphere. 
Recall that by Lemma 6.1, any incompressible vector a satisfies LCD a (a) > 
Aa/ti. For D > 0, set 

S D = {xe S™- 1 | D < LCD Q (x) < 2D}. 

It is enough to prove that 

P(3x G S D A'x = 0) < e~ n . 

whenever \^Jn < D < e cn . Indeed, the statement of the Theorem will then 
follow by taking the union bound over D = 2 k for k < cn. 

To this end, we shall use the e-net argument to bound ||A'a;|| below. For 
a fixed x G Sd, the required estimate follows from substituting the bound 



LCD n (x) > D in Lemma 6.6 



(7.3) F(\\A'x\\ 2 <tn 1/2 ) < (Ct) n ~\ 

provided t > To estimate the size of the e-net we use the bound for the 

essential least common denominator again. The simple volumetric bound is 
not sufficient for our purposes, and this is the crucial step where we explore 
the additive structure of Sd to construct a smaller net. 

Lemma 7.2 (Nets of level sets). There exists a (4a / D) -net in Sd of cardinality 
at most {CD/^/n) n . 



Proof. We can assume that 4a /D < 1, otherwise the conclusion is trivial. To 
shorten the notation, denote for x G Sd 

D(x) := LCD a (x). 

By the definition of Sd, we have D < D(x) < 2D. By the definition of the 
essential least common denominator, there exists p£Z n such that 

(7.4) \\D(x)x -p\\ 2 < a. 

Therefore 

p a a 1 

X ~ D(x) 2 < D(x) ~ D ~ 4' 
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(7.5) 



x — 



\Ph 



2a 

2 < ~D 



On the other hand, by (7.4) and using ||x||2 = 1, D(x) < 2D and 4a/D < 1, 
we obtain 



(7.6) 



\p\\ 2 < D{x) + a <2D + a <3D. 



Inequalities (7.5) and (7.6) show that the set 

V 



: p G Z" fl 5(0, 3D) 



is a (2a/D)-net of Sd- Recall that, by a known volumetric argument, the 
number of integer points in B(0,3D) is at most (1 + 9D/^/n) n < {CD/ \/n) n 
(where in the last inequality we used that by the definition of the level set, 
D > Cov^ for all incompressible vectors). Finally, we can find a (4a/D)-net 
of the same cardinality, which lies in Sd- D 



Now we can complete the e-argument. Recall that by Proposition AA 

POi(A') > C7 0V ^) < e~ cn . 
Therefore, in order to complete the proof, it is enough to show that the event 

£ := |3x G S D A'x = and \\A'\\ < C ^ 



has probability at most e~ n . 

Assume that £ occurs, and let x G Sd be such that Ax = 0. Let M be the 
(4a/D)-net constructed in Lemma 7.2 Choose y G Af such that \\x —y\\ < 

4:Ct „ „ n 



4a/ D. Then by the triangle inequality, 

||^4'y|| 2 < ||^4' || ■ \\x — y\\ 2 < Co\^n 



D 



4C /3 



if we recall that a = j3y/n. Set t = 4Cof3y^n/D. Combining the estimate (7.3) 
for this t with the union bound, we obtain 

F{£) < F(3y G J\f \\A'y\\ 2 < ty/n) < \N\ ■ {Ct) n - 1 <[—j=) • {Ct) n ~ l 



n—l 



Since D < e cn , we can choose the constant so that the right hand side of 



the previous inequality will be less than e n . The proof of Theorem 5.2 
complete. 



is 



□ 
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8. Short Khinchin inequality 

Let 1 < p < oo. Recall that ||-|| denotes the standard £ p norm, and its 
unit ball. 

Let X G W 1 be a vector with independent centered random ±1 coordinates, 
i.e. a random vertex of the discrete cube { — 1, l} n . The classical Khinchin 



inequality, Theorem |3.4[ asserts that for any p > 1 and for any vector a G IR ra , 

(E|(a, X)\ P Y^ P ~ P || a ll2- This equivalence can be obtained if one averages not 
over the whole discrete cube, but over some small part of it. The problem how 
small should this set be was around since mid-seventies. More precisely, 

Let p > 1. Find constants a p , (3 P and a set V C { — 1, 1} of a 
small cardinality such that 

for any a G M. n . 

Deterministic constructions of sets V of reasonably small cardinality are un- 
known. Therefore, we shall construct the set V probabilistically. Namely, we 
choose N = N(n,p) and consider N independent copies Xi, . . . ,Xn of the 
random vector X. If iV <C 2 n//2 , in particular, if iV is polynomial in n, all 
vectors X±, . . . ,Xn are distinct with high probability. The problem thus is 
reduced to showing that with high probability, any vector a G M. n satisfies 

(8-1) « p ||a|| 2 < ^El(a,^>rJ <A,N| 2 . 

This problem can be recast in the language of random matrices. Let A be the 
N x n matrix with rows Xi, . . . , X^. Then the inequality above means that 
A defines a nice isomorphic embedding of iV^ into izf . 

As in the proof of the original Khinchin inequality, we consider cases p = 1 
and p > 2 separately. 

8.1. Short Khinchin inequality for p = 1. In this case we derive the in- 



equality (jS . 1 p in a more general setup. Assume that the coordinates of the 

corn- 



vector X are i.i.d. centered subgaussian variables. Then Proposition 4.4 
bined with the inequality \\A : 1% if \\ < \/N ■ \\ A \\ yields the 

following 

Proposition 8.1. Let A be an N x n random matrix, N > n, whose entries 
are independent copies of a subgaussian random variable. Then 

P( \\A : q -»■ if || > tN) < e~ Cot2N for t > C . 
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This implies the second inequality in (8.1) with (3i = Co, so (8.1) is reduced 
to the first inequality. To establish it we apply the random matrix machinery 
developed in the previous sections. Without loss of generality, we may assume 
that n < N < 2n, because we are looking for small values of N. Then the 
following Theorem shows that the short Khinchin inequality holds for any 
N > n with a% depending only on the ratio of N/n. 

Theorem 8.2. Let n,N be natural numbers such that n < N < In. Let r 

be a centered subgaussian random variable of variance 1. Let A be an N x n 
matrix, whose entries are independent copies of V. Set m — N — n+ 1. Then 
for any e > 

fCN \ m 

P(3iG5 n ^ \\Ax\li < em) < ( ej + Cexp(-cn). 

Proof. Adding to the entries of A a small multiples of independent N(0, 1) 
variables, we may assume that the entries of A are absolutely continuous, so 
the matrix A is of a full rank almost surely. 

We start with an elementary lemma from linear algebra. 

Lemma 8.3. Let N > n and let A : IR n — > K be a random matrix with 
absolutely continuous entries. Let x G S n ~ l be a vector for which \\Ax\\ x attains 
the minimal value. Then 

| supp(Ac)| = N — n + 1 

almost surely. 

Proof. Let E = AW 1 and let K = £?f fl E. Set y = Ax/ \\Ax\\ v Since the 
function g : S 11 ^ 1 — > (0, oo), g{u) = {{Au^ attains the minimum at u — x, 
the function / : K — >■ (0,oo), f(z) = \\A~ 1 \e z\\ 2 attains the maximum over 
K at z — y. The convexity of ||-|| 2 implies that y is an extreme point of 
K. Since K is the intersection of the octahedron with an n-dimensional 
subspace, this means that |supp y\ < N — n + 1. Finally, since the entries 
of A are absolutely continuous, any coordinate subspace F C R N , whose 
dimension does not exceed N — n, satisfies E fl F — {0} a.s. Therefore, 
|supp y\ — N — n + 1. □ 

This Lemma allows us to reduce the minimum of HAc^ over the whole 
sphere S 1 ™ -1 to a certain finite subset of it. To each subset J C {1, . . . ,N} 
of cardinality m = N — n + 1 corresponds a unique pair of extreme points xj 
and — Xj of K such that Ylij^j \ x jU)\ = 1 an d = whenever j J. 

Let Ajr be the matrix consisting of the rows of A, whose numbers belong to 
J' = {1, . . . , iV} \ J. The vector yj G S 71 ^ 1 such that Ayj = txj for some t > 
is uniquely defined by the matrix Aj> via the condition Aj>yj = 0. By Lemma 
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8.3. 



min{||Ay|| 1 | y G S n ~ 1 } = winqAyj^ \ J C {1, . . . , N}, \J\ = m}. 



To finish the proof, we estimate each UAyjHj below and apply the union bound. 
Fix a set J C {1, . . . , N} of cardinality m. Denote the rows of the matrix 
A j i by Xf, . . . , X%_ v Applying Theorem 
conclude that 



7.1 



to the vectors X%, . . . , X n _i, we 



(8.2) 



P(LCD Q ( 2/J ) <e cn ) <e~ c ' n . 



Conditioning on the matrix Aj>, we may regard the vector yj as fixed. Denote 
a row of the matrix Aj by Y T , so the coordinates of Ajyj are distributed like 



(Y,yj). If LCD a ( yj ) > e cn , then by Theorem 6.2 



¥(\(Y,yj)\<e\Aj,)<Ce, 
whenever e > Ce~ cn . Then taking expectation over Aj> and using (8.2) yields 

P(|<y,2/j)|<e)<C r e + C r e- 



-cn _|_ g— c n 



for any e > 0. Coordinates Q, j G J of the vector Ajyj are i.i.d. random 
variables. Tensorization Lemma 6.5 can be easily reproved for \Q\ instead 
of ^2 . In this form it implies 



nWAyjh < em) = nWAjyjW, < em) < (Ce + Ce~ cn ) m 
for any e > 0. Finally, taking the union bound over all sets J, we obtain 

P(3J | J| = m, \\Ayj\h < em) < (^j ■ (Ce + Ct~ cn ) m 



< 



CN 



rn 



Ce 



□ 



Assume now that iV is in a fixed proportion to n, and define 5 by iV = 
(1 + 5)n. Then Theorem 8.2 implies that, with high probability, the short 
Khinchin inequality holds for iV independent subgaussian vectors with constant 
a\ = c5 2 . To see this, set e = ^° ma ke the right-hand side of the inequality 
in Theorem 18.21 non-trivial. 

Theorem |8.2 proves more than the short Khinchin inequality. Combining it 
with Proposition 4.4 we show that 

(8.3) Vi6l n e5n \\x\\ 2 < \\Ax\h < VN \\Ax\\ 2 < Cn \\x\ 



2 ■ 



with probability greater than 1 — Cexp(— cn) — (e/c5) Sn . This immediately 
yields a lower bound for the smallest singular value of a rectangular random 
matrix. 
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Corollary 8.4. Let n, N, 5, A, e be as above. Then the smallest singular value 
of A is bounded below by e5 ■ sfn with probability at least 1 — Cexp(— cn) — 
{e/c5) Sn . 

This bound is not sharp for small 5. The optimal estimate, valid for all n, N 
and e, was recently obtained in |25j . 

A celebrated theorem of Kashin [12J states that a random n-dimensional 
section of the standard octahedron of dimension N = [(1 + 5)n\ is close 
to the section of the inscribed ball (1/ 'yN)B® ' . The optimal estimates for the 
diameter of a random section of the octahedron were obtained by Garnaev and 
Gluskin [7] . Recently the attention was attracted to the question whether the 
almost spherical sections of the octahedron can be generated by simple random 
matrices, in particular by a random ±1 matrix. A general result proved in [18] 
implies that if N = + <5)nJ with 5 > cj logn, then a random N x n matrix 
with independent subgaussian entries generates a section of the octahedron 
Bi which is not far from the ball with probability exponentially close to 1. 
For random ±1 matrices this result was improved by Artstein-Avidan at al. 
[2], who proved a polynomial type estimate for the diameter of a section for 



5 > Cn 1 / 10 . Using (8.3) we obtain a polynomial estimate for the diameter of 



sections for smaller values of S. 

Corollary 8.5. Let n,N be natural numbers such that n < N < 2n. Denote 
S = (N — n)/n. Let ^ be a centered subgaussian random variable. Let A be 
an N x n matrix, whose entries are independent copies of ^ and let E = AW a . 
Then for any e > 

P (Vy G E, < V^V \\y\\ 2 < ^ \\y\\^j > 1 - Cexp(-cn) - (e/c5) Sn . 

Note that to make the probability bound non-trivial, we have to assume that 
£ = c'5 for some < d < c. In this case the corollary means that a random 
n-dimensional subspace E satisfies 

1 -.B»nEcB»nEc(£)'-±=B». 
- 1 1 \5 2 J x/AT 



This inclusion remains non-trivial as long as (p-) < v^iV, i.e., as long as 
5 > cN- 1 ^. 

8.2. Short Khinchin inequality for p > 2. The case p > 2 requires a com- 
pletely different approach. In this case the assumption that the coordinates of 
the random vector X are independent becomes unnecessary. We shall assume 
instead that X is isotropic and subgaussian. The first property means that for 
any y G S 12 ^ 1 

E(X )2/ ) 2 = 1, 
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while the second means that for any y G S n 1 the random variable (X, y) is 



centered subgaussian. By Theorem 3.3, any random vector with independent 



centered subgaussian coordinates of variance 1 is isotropic subgaussian. This 
includes, in particular, an appropriately scaled random vertex of the discrete 
cube{-l,l} n . 

We prove the following Theorem |10j . 

Theorem 8.6. Let X be an isotropic subgaussian vector in W 1 . Let X\,..., X^ 
be independent copies of X. Then for any p > 2 and any N > nP/ 2 , the in- 
equality 

i/p 



c\\y\\ 2 < 



N \ '■IV 

^EK^>N <CVp\\y\ 



N 

holds with high probability for all y G M. n . 

Proof. As in the classical Khinchin inequality, the first inequality in Theo- 



rem 



8.6 is easy. Denote, as before, by A the Nxn matrix with rows X±, . . . , X 



N- 



Assume that n is large enough, so that A" > n p l 2 > S x n, where So is the con- 



stant from Proposition 4.7 Combining this Proposition with the inequality 



\y\L < N 1 / 2 - 1 ^ ■ \\y\\ valid for all y G R N , we obtain 



P( min \\Ax\\ < aN 1/p ) < e" 



c 2 N 



which establishes the left inequality with probability exponentially close to 1. 

To prove the second inequality, we use the method of majorizing measures, 
or generic chaining, developed by Talagrand [32]. Let {X t } t£T be a real- valued 
random process, i.e., a collection of interdependent random variables, indexed 
by some set T. In the setup below, we can assume that T is finite or countable, 
eliminating the question of measurability of sup tgT X t . We shall call the process 
{X t }teT centered if EX t = for alU G T. 

Definition 8.7. Let (T,d) be a metric space. A random process {AT^j-^gy is 
called subgaussian with respect to the metric d if for any t,s G T, t ^ s the 
random variable (X t — X s )/d(t,s) is subgaussian. A random process {G t }teT 
is called Gaussian with respect to the metric d if for any finite set F C T 
the joint distribution of {G t }t^F is Gaussian, and for any t,s G T, t ^ s 
(G t — G s )/d(t,s) is N(0, 1) random variable. 

We use the following fundamental result due to Talagrand. 

Theorem 8.8 (Majorizing Measure Theorem). Let (T,d) be a metric space, 
and let {G t }teT be a Gaussian random process with respect to the metric d. 
For any centered random process {X t } t£ T, which is subgaussian with respect to 
the same metric, 

EsupX t < CEsupG t . 
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For (s,y) G WL N x K. n define the random variable X SjV by 

N 

x s,y = ^2s j {X j ,y). 

i=i 

Then for any T C B 2 x -E?2 , the random process {-^ s ,y}(s,y)eT is subgaussian 
with respect to the Euclidean metric. Indeed, for any (s,y), (s',y') G T, 

AT 

i=l 

Let A G R. Since the vector X is centered subgaussian, for any z G M. N 
exp(A(X, z)) < exp(CA 2 H^H^)- Hence, using independence of Xj and applying 
Cauchy-Schwartz inequality, we get 

Eexp (\(X s>y - X s > >y >)) 

N 

= JjEfexp (\( Sj - s'^X^y)) ■ exp {Xs'^X^y - y')) 
3=1 

N N 

< JJexp {2CX 2 ((s 3 - s'f \\y\\D) ■ J] exp (2CA 2 ( S ; 2 \\y - y'f 2 )) 

3=1 i=i 
<exp (2CX 2 (\\s-s'\\l + \\y-y'\\ 2 2 )). 

By Theorem |3.2| this means that the random variable 

\\(s,y)-(s',y')\\ 2 

is subgaussian. 

Let Y and Z be independent standard Gaussian vectors in IR n and 
respectively. Set 

G s , y = (s,Z) + (y,Y). 

Then for any T C M. N x R n , {Gs^^^t is a Gaussian process with respect to 
the Euclidean metric. Let l/p + l/p* = 1, and set r = BjxB 2 "c5fx B 2 . 
By the Majorizing Measure Theorem 

E sup X S)V < CE sup Gg^. 
(s,y)eT ' («,2/)eT 
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Therefore, 



L N \ 1/P 1 N 

E SU P at E I <*>> f > M = WTp E SUP SUP E S J <*>> f > 



s 



NVp 



Since A 7 " > ?W 2 , the last expression does not exceed C'y/p. To complete the 
proof we combine this estimate of the expectation with Chebyshev's inequality. 

□ 



Note that Theorem 8.6 implies that the matrix A formed by the vectors 



X 1 , . . . , X N defines a subspace of £^ which is close to Euclidean. This, in 
particular, means that the bound N > nPl 2 is optimal (see e.g., [5] for details). 

9. Random unitary and orthogonal perturbations 

The need for probabilistic bounds for the smallest singular value of a random 
matrix from a certain class arises in many intrinsic problems of the random 
matrix theory. Such bounds are the standard step in many proofs based on the 
convergence of Stieltjes transforms of the empirical measures to the Stieltjes 
transform of the limit measure. One of the examples, where such bounds 
become necessary is the Circular Law. The proof of this law requires the 
lower bound on the smallest singular value of a random matrix with i.i.d. 
entries, which was obtained above. Another setup, where such bounds become 
necessary, is provided by the Single Ring Theorem of Guionnet, Krishnapur 
and Zeitouni The proof of this theorem deals with another natural class 
of random matrices, namely random unitary or orthogonal perturbations of a 
fixed matrix. 

Let us consider the complex case first. Let D be a fixed n x n matrix, and let 
U be a random matrix uniformly distributed over the unitary group U(n). In 
this case the solution of the qualitative invertibility problem is trivial, since the 
matrix D + U is non-singular with probability 1. This can be easily concluded 
by considering the determinant of D + U. The determinant, however, provides 
a poor tool for studying the quantitative invertibility problem. In regard to 
this problem we will prove the following theorem. 

Theorem 9.1. Let D be an arbitrary nxn matrix, n > 2. Let U be a random 
matrix uniformly distributed over the unitary group U (n) . Then 

F{s n (D + U) <t) < t c n c for all t > 0. 



NON-ASYMPTOTIC THEORY 



33 



Here C and c are absolute constants. 



An important feature of Theorem 9.1 is its independence of the matrix D 



This independence is essential for the Single Ring Theorem. 



The statement similar to Theorem 9.1 fails in the real case, i.e., for random 



matrices distributed over the orthogonal group. Indeed, suppose that n is odd. 
If —D, U G SO(n), then —D~ 1 U G SO(n) has the eigenvalue 1, and the matrix 
D + U = D^D^ 1 !} + I n ) is singular. Therefore, if U is uniformly distributed 
over 0(n), then s n (D + U) — with probability at least 1/2. Nevertheless, it 
turns out that this is essentially the only obstacle to the extension of Theorem 



9.1 to the orthogonal case. 



Theorem 9.2 (Orthogonal perturbations). Let D be a fixed nxn real matrix, 
n > 2. Assume that 

(9.1) \\D\\<K, inf \\D-V\\>6 

VeO(n) 

for some K > 1, 5 G (0, 1). Let U be a random matrix uniformly distributed 
over the orthogonal group 0(n). Then 

F(s n (D + U) <t) <t c (Kn/5f, t > 0. 

Similarly to the complex case, this bound is uniform over all matrices D 



satisfying (9.1). This condition is relatively mild: in the case when K = n Cl 
and 5 = n~ C2 for some constants Ci, C2 > 0, we have 

F(s n (D + U) <t) < t c n c , t > 0, 

as in the complex case. It is possible that the condition ||D|| < K can be 



eliminated from the Theorem 9.2 However, this is not crucial because such 



condition already appears in the Single Ring Theorem. 



The problems we face in the proofs of Theorems |9. 1| and |9.2| are significantly 
different from those appearing in Sections |5j [7| In the case of the independent 
entries the argument was based on the analysis of the small ball probability 
P(||Ar|| 2 < t) or PdlAxdj) < t for a fixed vector x. As shown in Section|6j the 
decay of this probability as t — > is determined by the arithmetic structure 
of the coordinates of x. In contrast to this, the arithmetic structure plays no 



role in Theorems 9.1 and 9.2 The difficulty lies elsewhere, namely in the lack 
of independence of the entries of the matrix. We will have to introduce a set 
of the independent random variables artificially. These variables have to be 
chosen in a way that allows one to express tractably the smallest singular value 
in terms of them. To illustrate this approach, we present the proof of Theorem 



9.1 below. The proof of Theorem 9.2|starts with the similar ideas, but requires 



new and significantly more delicate arguments. We refer the reader to [28J for 
the details. 
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Proof of Theorem 9.1 . Throughout the proof we fix t > and introduce several 
small and large parameters depending on t. The values of such parameters will 
be chosen of orders t a , where < a < 1 for the small parameters, and t~ b , 
< b < 1 for the large ones. This would allow us to introduce an hierarchy of 
parameters, and disregard the terms cor responding to the smaller ones. Also, 



note that we have to prove Theorem |9 . 1 1 only for t < n c ' for a given constant 



C, because for larger values of t its statement can be made vacuous by choosing 
a large constant C . This observation would allow us to use bounds of the type 
y/nt a < t a whenever a < a' are constants. 

For convenience of a reader, we include a special paragraph entitled "Choice of the 
parameters" in the analysis of each case. In these paragraphs we list the constraints that 
the small and large parameters must satisfy, as well as the admissible numerical values 
of those parameters. These paragraphs will be printed in sans-serif and can be omitted 
on the first reading. 

To simplify the argument, we will also assume that ||D|| < K, as in Theorem 



9.2 The proof of Theorem 9.1 without this assumption can be found in 



9.1. Decomposition of the sphere and introduction of local and global 
perturbations. We have to bound s n (U + D), which is the minimum of 
\\(D + £/)x|| 2 over the unit sphere. For every x G S ,n ~ 1 , there is a coordinate 
Xj with \xj\ > 1/y/n. Hence, the union bound yields 



XD + U)<t)< f> Qnf ||(17 + D)x\\ 2 < t) 



where 



Sj = {xeS™- 1 | \ Xj \ > 1/vM- 

All terms on the right hand side of the inequality above can be estimated in the 
same way. So, without loss of generality we will consider the case j = 1. Note 
that the application of the crude union bound here may have increased the 



probability estimate of Theorem |9.1| n times. This, however, is unimportant, 
since we allow the coefficient n c anyway. 

The proof of the theorem reduces to the estimate of 

(9.2) P f inf \\(U + D)x\L < t 

\xeSi 

The structure of the set S\ gives a special role to the first coordinate. This will 
be reflected in our choice of independent random variables. If R, W G U(n) 
are any matrices, and V is uniformly distributed over U(n), then the matrix 
U = V~ 1 R~ l W is uniformly distributed over U(n) as well. Hence, if we 
assume that the matrices R and W are random and independent of V, then 
this property would remain valid for U. The choice of the distributions of R 
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and W is in our hands. Set 

R = diag(r, 1,...,1), 

where r is a random variable uniformly distributed over {z G C \z\ — 1}. 
This is a "global" perturbation, since we will need the values of r, which are 
far from 1. The matrix W will be "local", i.e., it will be a small perturbation of 
the identity matrix. Let e > be a "small" parameter, and set W = exp(eS), 
where S is an n x n skew-symmetric matrix, i.e. S* = —S. Although he matrix 
W is unitary, the dependence of its entries on the entries of S is hard to trace. 
To simplify the structure, we consider the linearization of W, 

W = I + eS. 

The matrix Wq is not unitary, but its distance to the group U(n) is at most 
\\W - W \\ < e 2 \\S\\ 2 . Thus, for any x G S u 

+ U)x\\ 2 = \\(D + V^R-'W^h = \\{RVD + W)x\\ 2 

> \\(RVD + W )x\\ 2 - \\W - W \\ 

> \\{RVD + I + eS)x\\ 2 - e 2 \\S\\ 2 . 

We will use S to introduce a collection of independent random variables. Set 

Z 



(9.3) S 



where s ~ A/" R (0, 1) and Z ~ AT R (0, I n ~i) are independent real-valued standard 
normal random variable and vector respectively. Clearly, S is skew-Hermitian. 



If K is a "large" parameter, K = t b °, then by Proposition 4.4 
P(||Z|| 2 > K ov ^l) < exp{-c K 2 n) < t 



for all sufficiently small t > 0. This means that \\S\\ < K^n with probability 
close to 1 . Disregarding an event of a small probability, we reduce the problem 
to obtaining a lower bound for 

inf \\(RVD + I + eS)x\\ 2 , 

provided that the bound we obtain is of order at least e. Indeed, we may 
assume that K^ne 2 <C e, if e is chosen small enough. 

Choice of the parameters. The second order term 2e 2 ||S' 2 || should not affect the 
estimate of P(inf ;re 5 1 \\Ax\\ < t). To guarantee it, we require that 

Klne 2 < t/2. 

Also, to bound the probability by a power of t, we have to assume that 

exp(-c Ko«) < t c 

for some c > 0. Both inequalities are satisfied for small t if e = i 0,6 and Kq = i -0 - 05 . 
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Starting from this moment we will condition on the matrix V and evaluate 
the conditional probability with respect to the random matrices R and S. The 
original random structure will be lost after this conditioning. However, we 
introduced a new independent structure in the form of the matrices R and S, 
and it will be easier to manipulate. Each of the matrices R and S alone is 
insufficient to obtain any meaningful estimate. Nevertheless, the combination 
of these two sources of randomness, a local perturbation S and a global pertur- 
bation R, produces enough power to conclude that RVD + / + eS is typically 



well invertible, and this leads to the proof of Theorem 9.1 



Summarizing the previous argument, we conclude that our goal is to bound 



(inf ||Ac|| 2 <t), 



where 
(9.4) 



A = RVD + I + eS 



A n Y T 
X B T 



X, Y G C n_1 , B is an (n — 1) x (n—1) matrix, and e = t a . Here we decomposed 
the matrix A separating the first coordinate to emphasize its special role. For 
future reference we write A in terms of the components of the matrix VD, 
and random variables r, s, and Z exposing the dependence on these random 
parameters: 

Y T ] _ \ra 
B T ~ 



(9.5) 



.4 



An 
X 




(rv - eZ) J 
B T 



Here a G C, u, v G C n_1 , and the matrix B are independent of r, s, and Z. 
After conditioning on V, we can treat them as constants. 

The further strategy takes into account the properties of the matrix B. 
Depending on the invertibility properties of this matrix, we condition on some 
of the random variables r, s, and Z, and use the other ones to show that A is 
well-invertible with high probability. 

9.2. Case 1: B is poorly invertible. Assume that s n (B) < Ai£, where Ai 
is another "small" parameter (Ai = t ai for < a\ < 1). In this case we will 
condition on r and s, and rely on Z to obtain the probability bound. We know 
that there exists a vector w G S n ~ 2 such that ||-B?Z>|| 2 < Ai£. Let x G Si be 
arbitrary. We can express it as 



x 



Xi 
X 



where > 



1 



77 



Set 



w 



w 



G C 
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Using the decomposition of A given in (9.4), we obtain 



\\Ax\\ 2 > |wMx 



[0 w T ] 



x 



X B T 

X\ ■ w T X + w T B T x\ 

Bw\\2 (by the triangle inequality) 



> Ixil • \w T X 



> —j= \w X\ — \\E (using |xi| > 1/y/n). 



n 



By the representation (9.5), X = u + eZ, where hgC™ is a vector indepen- 
dent of Z. Taking the infimum over x G Si, we obtain 



inf ||Ac||2 > 



1 



w T u + ew 1 Z\ — Ai£. 



-r.T 



n 



Recall that w, u are fixed vectors, ||w||2 = 1, and Z ~ A%(0,/„_i). Then 
w T Z = 7 is a complex normal random variable of variance 1: E|7| 2 = 1. This 
means that E(Re(7)) > 1/2 or E(lm(7)) > 1/2. A quick density calculation 
yields the following bound on the conditional probability: 

F z { \w T u + ew T Z\ < 2\ 1 e v fo] < CX^. 

Therefore, a similar bound holds unconditionally. Thus, combining the previ- 
ous estimates, we conclude that in case when s n (B) < Ai£, and if e and Ai are 
chosen so that A]£ > t, we have 

P( inf || Ar|| 2 <t)< ¥{-= \w T X\ - Aie < t) 



n 



F{\w T u + ew T Z\ < 2Ai£v^} < CAiv^ = Cy/n ■ t ai . 



Choice of the parameters. The constraint 

Aie > t, 



appearing in this case, holds if we take Ai = t . 

9.3. Case 2: B is nicely invertible. Assume that s n (B) > A 2 , where 
A 2 = t a2 is a "small" parameter. In this case, we will also use only the local 
perturbation, however the crucial random variable will be different. We will 
condition on r and Z, and use the dependence on s to derive the conclusion 
of the theorem. 
Set 

M-V ° 

then ||M|| < X^ 1 . Therefore, 



inf |L4x|| 2 >A 2 inf ||MAr|| 2 . 
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The matrix MA has the following block representation: 

An Y T 



MA 



1 

T\-l 



X I n -i 



Recall that we assumed that \\D\\ < K where K is a constant. Combining this 
with the already used inequality \\Z\\ 2 < K y/n, which holds outside of the 
event of exponentially small probability, we conclude that 

\\Y\\ 2 < Ik || 2 + £ W Z \\2 < 2K 

if eKoy/n < K. To bound inf xe 5j ||Ar|| 2 , we use an observation that 



[i -y-]. 



'n-1 



0. 



This implies that for every x £ Si, 
||MAx|| 2 > ' 



[1 -YT\l 



[I -Y T ]MA 



X 



£ 2K ■ ^ 

1 

> 



Y 1 (B 1 Y 1 X\ ■ \x x \ 
A 11 -Y T {B T )- 1 X\. 



2K^i 

The right hand side of this inequality does not depend on x, so we can take 
the infimum over x G Si in the left hand side. Combination of the previous 
two inequalities reads 

A 2 



inf \\Ax\L > 



Recall that according to (9.5), A 



\A n -Y*\B^X\ 

n — \f—les + d, where s is a real iV(0, 1) 
random variable, and d is independent of s. Conditioning on everything, but 
s, we can treat d and Y T (B T )~ l X as constants. An elementary estimate using 
the normal density yields 

F 8 (\A n - Y T (B T )~ 1 X\ < fi) < C- for all fj, > 0. 



Applying this estimate with fx = 2K ^ ri ■ t and integrating over the other random 
variables, we obtain 

< 2K ^.t<C>Jn-f 



P(inf UA4, < t) < C 

for some c > if A2 is chosen appropriately. 
Choice of the parameters. The inequality 

1 



A 2 e 



t<t, c> 
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holds with c = 0.2 if we set A2 = t - 2 . The constraint 

eKoVn < K, 

appearing above, is satisfied since we have chosen e = i ' 6 and Kq = t~ 05 . 

One can try to tweak the parameters Ai,A2, and e to cover all possible scenarios. 
This attempt, however, is doomed to fail since the system of the constraints becomes 
inconsistent. Indeed, to include all matrices B in Cases 1 and 2, we have to choose 
A9 < \\S. With this choice, 

A> * 

A 2 £ Alt 

because of the constraint KqUE 2 < t/2. This forces us to consider the intermediate 
case. 

9.4. Case 3, intermediate: B is invertible, but not nicely invertible. 

Assume that Ai£ < s n (B) < A 2 with A 2 ,Ai defined in Cases 1 and 2. This 
is the most delicate case. Here we will have to rely on both local and global 
perturbations. We proceed like in Case 2 by multiplying Ax from the left by 
a vector which eliminates the dependence on all coordinates of x, except the 
first one. To this end, note that 



1 2 > *> 

AtE z 



[I —Y T (B T )~ 1 ^ 



Y 

B' J 



T 



0. 



Hence, for any x E Si, 



\Ax\\ 2 > 



1 



> 



[1 -Y T (B T )-i] 
1 

1 + \\Y T {B 



> 



\\Y T (B 



T\-l\ 



[1 -Y T (B T )-i] 

\(A n - Y T (B T )~ 1 X)xi 
A U -Y T {B T )- 1 X\ ' 



An 

X B q 



X 



n 



Since the right hand side is independent of x, we can take the infimum over 
x E Si. 

Note that Y T (B T )^ 1 is independent of s, see (9.5). We consider two sub- 
cases, if ||r T (fi T )- 1 | 



12 — ^2 X ) then 
inf \\Ax\L > ^= 



An 



T(T3T\-X 



Y 1 B 



X\ 



and we can finish the proof exactly like in Case 2, by conditioning on everything 
except s, and estimating the probability with respect to s. 

The second subcase requires more work. Assume that ||F T (i? T ) _1 
Then the inequality above yields 

1 -\A n 



2 > A? 1 - 



inf ||Ac|| 2 > 



2^\\Y T (B T ] 



-1 



Y T (B T r 1 X\. 
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Since we do not have a satisfactory upper bound for ||F T (.B T ) _1 || 2 , we cannot 
rely on An to estimate the small ball probability. The second term in the nu- 
merator looks more promising, because it contains the same vector Y T (B T )~ 1 . 
This term, however, is difficult to analyze, since the random vectors X and 
Y are dependent. A simplification of both numerator and denominator would 
allow us to get rid of this dependence. 

We start with analyzing the denominator. By (9.5), Y — rv — eZ, so 



\Y T {B T )-\<\\v T {B T )-% + e\\Z T {B T ^ 



As in the previous cases, disregarding an event of a small probability, we can 
assume that \\Z\\ 2 < K^y/n. Then by the assumption on s n (B), 



e \\7Tfr>T\-i\\ / ^KoVn KoVn 
£ ^ {B) 1,2 -^(Bj-—- 

The parameters K , Xi, and A2 can be chosen so that < A 2 1 /2. Then, 

since ||F T ( J B T )- 1 || 2 > A2 \ we conclude that 



\Y T (B T )- 1 \\ 2 <2\\v T (B T "-^ 



inf || Ar|| 2 > - ) ■ \A n - Y T (B T )~ 1 X\. 



and 



2 

The denominator here is independent of our random parameters. 

Now we pass to the analysis of the numerator. From (9.5) follows that 
An — Y T (B T )~ 1 X = ar + /3 is a linear function of r with coefficients a and {3, 
which depend on other random parameters. This representation would allow 
us to filter out several complicated terms in An — Y T (B T )~ 1 X by using the 
global perturbation r. 

Let A3 > be a "small" parameter: A3 = t a3 . Condition on everything, 
except r. Since r is uniformly distributed over the unit circle in C, an easy 
density calculation yields 

(9.6) P r (|ar + 6| > A 3 |a|) > 1 - CA 3 . 

Taking the expectation with respect to the other random variables shows that 
the same bound holds unconditionally. Thus, disregarding the event of a small 
probability CA 3 , we obtain that \An — Y T (B T )~ 1 X\ > As|a|. The coefficient 
a in turn can be represented as follows: a = a' — ev T (B T )~ 1 Z, where a' G C 
is independent of Z. Incorporating this into the bound above, we obtain 
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Using the global perturbation allowed us to simplify the numerator and expose 
its dependence on the local perturbation variable Z . We will finish the proof 
using this local perturbation. 

Set h T = v T (B T y 1 / \\v T (B T y 1 \\ 2 and recall that h G C"" 1 is independent 
of Z. Conditioning on everything except Z, we see that 



a' 



g ■= ij^j^prjj — ehTz = const + £ t'j 

where 7' is a complex normal random variable of unit variance: E|7'| 2 = 1. 
Hence, as before, for any /i > 

Pz(M < fi) < Cfi/e, 

and integrating over other random variables, we conclude that the same esti- 
mate holds unconditionally. Combining this inequality with the previous one 
and recalling that we dropped an event of probability CA 3 while using (9.6), 
we obtain 



P( inf \\Ax\\ 2 < t) < P (\g\ < ^— t J + CX 3 < C^^t + CX 3 < C 'Vnt c ' 
xeSi \ A 3 / X 3 e 

for some d > 0. Choosing appropriate constants a and a 3 in e = t a and 
A3 = t aA finishes the proof in this case and completes the proof of Theorem 
I5~T1 

Choice of the parameters. The analysis of this case requires the following two 
constraints: 

< and T— + A 3 < * . c > 0. 

Ai I \ 3 e 

The first one is satisfied with the choice Kq = i -0 05 , Ai = t 0A , A2 = t ' 2 that we made 

above. To satisfy the second one, set A3 = t - 2 . □ 

We made no effort to optimize the dependence on t and n in the proof above. 
It would be interesting to find the optimal bound here. Another interesting 
question, suggested by Djalil Chafai, is to analyze the behavior of the smallest 
singular value of the matrix D + U where U is uniformly distributed over a 
discrete subgroup of the unitary group. The case of the permutation group 
may be of special interest, because of its relevance for random graph theory. 
This question may require a combination of tools from Sections |5]-[9j since 
both obstacles, the arithmetic structure and the lack of independence, make 
an appearance here. 
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